Search CORE

15 research outputs found

The MADP Toolbox: An Open-Source Library for Planning and Learning in (Multi-)Agent Systems

Author: Messias JV
Oliehoek FA
Robbel P
Spaan MTJ
Terwijn B
Publication venue
Publication date: 01/08/2017
Field of study

This article describes the MultiAgent Decision Process (MADP) toolbox, a software library to support planning and learning for intelligent agents and multiagent systems in un- certain environments. Some of its key features are that it sup- ports partially observable environments and stochastic tran- sition models; has unified support for single- and multiagent systems; provides a large number of models for decision- theoretic decision making, including one-shot decision mak- ing (e.g., Bayesian games) and sequential decision mak- ing under various assumptions of observability and coopera- tion, such as Dec-POMDPs and POSGs; provides tools and parsers to quickly prototype new problems; provides an ex- tensive range of planning and learning algorithms for single- and multiagent systems; and is written in C++ and designed to be extensible via the object-oriented paradigm

University of Liverpool Repository

TU Delft Repository

Parameter-Independent Strategies for pMDPs via POMDPs

Author: A Lukina
C Baier
C Baier
C Daws
C Dehnert
C Dehnert
D Beyer
E Bartocci
E Polgreen
EM Hahn
EM Hahn
J Aspnes
K Chatterjee
K Chatterjee
K Chatterjee
K Chatterjee
LI Sennott
M Baldi
M Cubuktepe
M Kwiatkowska
MTJ Spaan
N Jansen
O Madani
PR Halmos
R Lanotte
S Pathak
S Russell
T Quatmann
V Kreinovich
Publication venue
Publication date: 01/01/2018
Field of study

Markov Decision Processes (MDPs) are a popular class of models suitable for solving control decision problems in probabilistic reactive systems. We consider parametric MDPs (pMDPs) that include parameters in some of the transition probabilities to account for stochastic uncertainties of the environment such as noise or input disturbances. We study pMDPs with reachability objectives where the parameter values are unknown and impossible to measure directly during execution, but there is a probability distribution known over the parameter values. We study for the first time computing parameter-independent strategies that are expectation optimal, i.e., optimize the expected reachability probability under the probability distribution over the parameters. We present an encoding of our problem to partially observable MDPs (POMDPs), i.e., a reduction of our problem to computing optimal strategies in POMDPs. We evaluate our method experimentally on several benchmarks: a motivating (repeated) learner model; a series of benchmarks of varying configurations of a robot moving on a grid; and a consensus protocol.Comment: Extended version of a QEST 2018 pape

arXiv.org e-Print Archive

Crossref

Publikationsserver der RWTH Aachen University

IST Austria: PubRep (Institute of Science and Technology)

Confidence in uncertainty: Error cost and commitment in early speech hypotheses

Author: A Pomerantz
A Sadeghipour
A Sauppé
B Keysar
C Cummins
C O’Callaghan
Cheryl Mary Corcoran
D Blakemore
D Schlangen
D Traum
DM Bates
E Yaylali
EA Schegloff
EA Schegloff
F Faul
F Ferreira
GE Dahl
H Buschmeier
H Sacks
HH Clark
HH Clark
I Fischer
J Cohen
J Davidson
J Kappes
Jan P. de Ruiter
JD Jescheniak
JE Arnold
JI Krueger
JI Krueger
JI Krueger
JI Krueger
JJA Van Berkum
JN Rouder
JP De Ruiter
Katharina Jettka
L Frazier
L Magyari
L Magyari
L Riek
M Arai
M Barthel
M Gašić
M Lafrance
M Otten
M Otten
M Stel
M Thum
M Trimmel
Manuel Giuliani
ME Foster
MJ Spivey
MJ Spivey-Knowlton
MTJ Spaan
N Blanchard
N Epley
P Indefrey
P Li
P Lison
R Ratcliff
R Yaghoubzadeh
RC Schank
RH Baayen
RM Dawes
RP Abelson
S Kopp
S Loth
S Loth
S Young
SC Levinson
SD Goldinger
SE Brennan
Sebastian Loth
SG Luke
Stefan Kopp
T Baumann
T Sellke
T Stivers
TT Schnur
U Halekoh
VS Ferreira
WA Foley
WJM Levelt
Y Kamide
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

© 2018 Loth et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Interactions with artificial agents often lack immediacy because agents respond slower than their users expect. Automatic speech recognisers introduce this delay by analysing a user’s utterance only after it has been completed. Early, uncertain hypotheses of incremental speech recognisers can enable artificial agents to respond more timely. However, these hypotheses may change significantly with each update. Therefore, an already initiated action may turn into an error and invoke error cost. We investigated whether humans would use uncertain hypotheses for planning ahead and/or initiating their response. We designed a Ghost-in-the-Machine study in a bar scenario. A human participant controlled a bartending robot and perceived the scene only through its recognisers. The results showed that participants used uncertain hypotheses for selecting the best matching action. This is comparable to computing the utility of dialogue moves. Participants evaluated the available evidence and the error cost of their actions prior to initiating them. If the error cost was low, the participants initiated their response with only suggestive evidence. Otherwise, they waited for additional, more confident hypotheses if they still had time to do so. If there was time pressure but only little evidence, participants grounded their understanding with echo questions. These findings contribute to a psychologically plausible policy for human-robot interaction that enables artificial agents to respond more timely and socially appropriately under uncertainty

Crossref

Directory of Open Access Journals

UWE Bristol Research Repository

Publications at Bielefeld University

Exploiting submodular value functions for scaling up active perception

Author: Oliehoek FA
Satsangi Y
Spaan MTJ
Whiteson SA
Publication venue: Springer Verlag
Publication date: 01/01/2017
Field of study

In active perception tasks, an agent aims to select sensory actions that reduce its uncertainty about one or more hidden variables. For example, a mobile robot takes sensory actions to efficiently navigate in a new environment. While partially observable Markov decision processes (POMDPs) provide a natural model for such problems, reward functions that directly penalize uncertainty in the agent’s belief can remove the piecewise-linear and convex (PWLC) property of the value function required by most POMDP planners. Furthermore, as the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially with it, making POMDP planning infeasible with traditional methods. In this article, we address a twofold challenge of modeling and planning for active perception tasks. We analyze ρPOMDP and POMDP-IR, two frameworks for modeling active perception tasks, that restore the PWLC property of the value function. We show the mathematical equivalence of these two frameworks by showing that given a ρPOMDP along with a policy, they can be reduced to a POMDP-IR and an equivalent policy (and vice-versa). We prove that the value function for the given ρPOMDP (and the given policy) and the reduced POMDP-IR (and the reduced policy) is the same. To efficiently plan for active perception tasks, we identify and exploit the independence properties of POMDP-IR to reduce the computational cost of solving POMDP-IR (and ρPOMDP). We propose greedy pointbased value iteration (PBVI), a new POMDP planning method that uses greedy maximization to greatly improve scalability in the action space of an active perception POMDP. Furthermore, we show that, under certain conditions, including submodularity, the value function computed using greedy PBVI is guaranteed to have bounded error with respect to the optimal value function. We establish the conditions under which the value function of an active perception POMDP is guaranteed to be submodular. Finally, we present a detailed empirical analysis on a dataset collected from a multicamera tracking system employed in a shopping mall. Our method achieves similar performance to existing methods but at a fraction of the computational cost leading to better scalability for solving active perception tasks.</p

Oxford University Research Archive

Exploiting submodular value functions for scaling up active perception

Author: A Krause
C Raphael
CC White
D Gilbarg
D Golovin
DP Bertsekas
Frans A. Oliehoek
G Nemhauser
G Shani
GE Monahan
H Kurniawati
J Pineau
J Williams
KJ Aström
LP Kaelbling
M Hauskrecht
Matthijs T. J. Spaan
MJ Kochenderfer
ML Fisher
MTJ Spaan
MTJ Spaan
MTJ Spaan
R Bajcsy
RD Smallwood
S Ji
S Joshi
S Ross
Shimon Whiteson
V Krishnamurthy
WS Lovejoy
Yash Satsangi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

In active perception tasks, an agent aims to select sensory actions that reduce its uncertainty about one or more hidden variables. For example, a mobile robot takes sensory actions to efficiently navigate in a new environment. While partially observable Markov decision processes (POMDPs) provide a natural model for such problems, reward functions that directly penalize uncertainty in the agent’s belief can remove the piecewise-linear and convex (PWLC) property of the value function required by most POMDP planners. Furthermore, as the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially with it, making POMDP planning infeasible with traditional methods. In this article, we address a twofold challenge of modeling and planning for active perception tasks. We analyze rhoPOMDP and POMDP-IR, two frameworks for modeling active perception tasks, that restore the PWLC property of the value function. We show the mathematical equivalence of these two frameworks by showing that given a rhoPOMDP along with a policy, they can be reduced to a POMDP-IR and an equivalent policy (and vice-versa). We prove that the value function for the given rhoPOMDP (and the given policy) and the reduced POMDP-IR (and the reduced policy) is the same. To efficiently plan for active perception tasks, we identify and exploit the independence properties of POMDP-IR to reduce the computational cost of solving POMDP-IR (and rhoPOMDP). We propose greedy point-based value iteration (PBVI), a new POMDP planning method that uses greedy maximization to greatly improve scalability in the action space of an active perception POMDP. Furthermore, we show that, under certain conditions, including submodularity, the value function computed using greedy PBVI is guaranteed to have bounded error with respect to the optimal value function. We establish the conditions under which the value function of an active perception POMDP is guaranteed to be submodular. Finally, we present a detailed empirical analysis on a dataset collected from a multi-camera tracking system employed in a shopping mall. Our method achieves similar performance to existing methods but at a fraction of the computational cost leading to better scalability for solving active perception tasks.Algorithmic

University of Liverpool Repository

Crossref

TU Delft Repository

Oxford University Research Archive

UvA-DARE

International Migration, Integration and Social Cohesion online publications

A framework for verifying autonomous robotic agents against environment assumptions

Author: C Dixon
F Ingrand
LA Dennis
M Luckcuck
M Webster
MTJ Spaan
O Coudert
P Gainer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2020
Field of study